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Abstract 

Introduction: The increasing number of targeted therapies, together with a deeper understanding of cancer 
genetics and drug response, have prompted major healthcare centers to implement personalized treatment 
approaches relying on high-throughput tumor DNA sequencing. However, the optimal way to implement this 
transformative methodology is not yet clear. Current assays may miss important clinical information such as the 
mutation allelic fraction, the presence of sub-clones or chromosomal rearrangements, or the distinction between 
inherited variants and somatic mutations. Here, we present the evaluation of ultra-deep targeted sequencing 
(UDT-Seq) to generate and interpret the molecular profile of 38 breast cancer patients from two academic medical 
centers. 

Methods: We sequenced 47 genes in matched germline and tumor DNA samples from 38 breast cancer patients. 
The selected genes, or the pathways they belong to, can be targeted by drugs or are important in familial cancer 
risk or drug metabolism. 

Results: Relying on the added value of sequencing matched tumor and germline DNA and using a dedicated 
analysis, UDT-Seq has a high sensitivity to identify mutations in tumors with low malignant cell content. 
Applying UDT-Seq to matched tumor and germline specimens from the 38 patients resulted in a proposal for 
at least one targeted therapy for 22 patients, the identification of tumor sub-clones in 3 patients, the 
suggestion of potential adverse drug effects in 3 patients and a recommendation for genetic counseling for 
2 patients. 

Conclusion: Overall our study highlights the additional benefits of a sequencing strategy, which includes 
germline DNA and is optimized for heterogeneous tumor tissues. 



Introduction 

The use of highly effective targeted therapies in cancer 
frequently depends on the specific mutational profile of 
the tumor. As an increasing number of targeted therapies 
become available, determining the comprehensive genetic 
profile of a tumor is critical in understanding the response 
to targeted drugs for cancer treatment. Indeed, this gen- 
etic profile can help predict sensitivity or resistance to 

* Correspondence: oharismendy@ucsd.edu; hantoncu@uci.edu; 
kafrazer@ucsd.edu 

^Division of Genome Information Sciences, Department of Pediatrics and Rady 
Children's Hospital, University of California San Diego, 9500 Gilman Drive, La 
Jolla, 92093, CA, USA 

^Department of Epidemiology, School of Medicine, University of California 
Irvine, Irvine, CA, USA 

Full list of author information is available at the end of the article 



particular therapies and can therefore offer new, tailored 
treatment options to patients with late-stage or recurrent 
disease. In breast cancer, for example, trastuzumab has 
been used for Her2 amplified or overexpressing breast 
cancer. Notably, this strategy may suggest the use of a 
drug indicated for another anatomic cancer type, or the 
use of an investigational drug. Measuring the true clinical 
benefit of this tailored strategy is difficult, however, be- 
cause targeted therapy frequently leads to drug resistance, 
the mechanisms of which are often not well understood. 
Nevertheless, this area of research is developing rapidly 
and some preliminary studies matching therapy to the 
tumor mutational profile across many clinical trials show 
an improved response rate [1]. 
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Traditionally, several types of molecular assays are 
available to identify somatic DNA mutations in tumors. 
Such assays analyze single positions, single exons, or 
whole genes using mass spectrometry [2], allele-specific 
polymerase chain reaction (PGR) [3] or Sanger sequen- 
cing. These assays are, however, limited in scope - look- 
ing only at specific genes or mutations - and limited in 
sensitivity - usually dependent on the fraction of tumor 
cells contained in the tissue specimen. More recently, 
high-throughput sequencing of candidate genes has ex- 
tended the breadth and sensitivity of this approach, 
overcoming some of these drawbacks [4-7]. Some major 
clinical centers are now starting to use more compre- 
hensive molecular profiling in clinical care. However, 
these assays differ with regards to breadth (number of 
genes), depth (number of independent DNA molecules 
sampled) and design - selection of the genes or inclu- 
sion of a matched germline control. As a consequence, 
the clinical utility may vary. The Cancer Genome Atlas 
(TCGA) [8], a consortium focused on research and dis- 
covery, sequenced the entire exome of tumors but at 
limited coverage depth, rejecting specimens with less 
than 60% cellularity and preventing the reliable identifica- 
tion of subclonal mutations. More targeted commercial 
assays such as Foundation One (Foundation Medicine, 
Cambridge, MA) may generate increased coverage depth 
of a smaller set of genes but do not always report the mu- 
tant allelic fraction [9]. Such diagnostic services also omit 
the comparison with a matched germline control, which is 
essential to increase the analytical sensitivity and distin- 
guish between inherited variants and somatic mutations. 

Ultra-deep targeted sequencing (UDT-Seq) [5,10] of 
matched tumor-germline specimens has not yet been 
evaluated in a clinical setting. The sequencing of matched 
tumor-germline samples is crucial to distinguish somatic 
mutations from sequencing artifacts; it is also critical to 
establish with certainty that a variant identified in the 
tumor is somatic rather than inherited since filtering 
against polymorphism databases can eliminate real muta- 
tions [11]. In the absence of a matched germline DNA se- 
quence, the misinterpretation of an inherited variant for a 
somatic mutation could potentially prevent a patient from 
getting appropriate genetic counseling. Additionally, inhe- 
rited variation in metabolism genes such as DPYD or 
CYP2D6 has been associated with 5-fluorouracil toxicity 
and possibly tamoxifen efficacy [12], respectively, and, al- 
though the variants are rare, a more systematic clinical 
screening would provide important benefits. The simul- 
taneous sequencing of the germline DNA along with the 
tumor DNA therefore offers technical advantages to iden- 
tify somatic mutations at low allelic fraction and increases 
the opportunity to identify actionable inherited variants. 

Here, we evaluate a targeted sequencing assay for its use 
in a cancer clinical setting. Specifically, we performed 



UDT-Seq of 47 genes that are clinically actionable or im- 
portant for patient care. We show that potentially import- 
ant information is gained by sequencing at high depth, 
including identification of subclonal mutations. Additional 
information is also gained from the sequencing of matched 
germline DNA and from the inference of tumor DNA copy 
number alterations. We therefore demonstrate that in com- 
parison with other high-throughput sequencing methods, 
UDT-Seq of matched tumor-germline DNA used in a clin- 
ical setting generates more potentially actionable findings 
for a greater number of patients. 

Methods 

Clinical specimens 

All University of California, San Diego and University of 
California, Irvine patients were consented in accordance 
with the protocols approved by their respective Institutional 
Review Board of the university (Table SI in Additional file 1 
and Additional file 2). Snap-frozen tissue samples were 
subjected to mechanical pulverization, followed by disrup- 
tion of the tissue in lysis buffer and DNA/RNA extraction 
using AllPrep DNA extraction kits (Qiagen GmbH, Hilden, 
Germany) according to the manufacturers recommenda- 
tion. Germline DNA was extracted from blood clots using 
Qiagen Clotspin Baskets and DNA QIAmp DNA Blood 
maxi kits (Qiagen Inc., Valencia, CA, USA) and from 
saliva samples according to the respective manufac- 
turers protocol. 

Data generation 

The data were generated according to our published UDT- 
Seq method [5,10]. Briefly, the genomic DNA samples 
were fragmented to an average size of 3 kb. To prepare the 
input DNA template mixture for targeted amplification, 
1.5 [ig of the purified genomic DNA fragmentation reac- 
tion was added to 9.4 [A of 10 x High-Fidelity Buffer 
(11304-029; Invitrogen (Carlsbad, CA, USA)), 2.5 [A of 
50 mM MgS04, 2.5 [A of 10 mM dNTP, 7.2 (il of 4 M Beta- 
ine, 7.2 [A RDT Droplet Stabilizer, 3.6 [A dimethyl sulfoxide 
and 1.4 [A of 5 units/ (il Platinum High-Fidelity Taq (Invi- 
trogen), and the samples were brought to a final volume of 
50 [A with nuclease-free water. The primer droplets (Table 
S2 in Additional file 1 and Additional file 2) were merged 
with the sample droplets on the RDTIOOO (RainDance 
Technologies (Billerica, MA, USA)). 

The PGR reactions were carried out as follows: initial 
denaturation at 94°C for 2 minutes; 55 cycles at 94°C for 
30 seconds, 54°C for 30 seconds and 68°C for 60 seconds; 
and final extension at 68°C for 10 minutes, followed by a 
4°C hold. After breaking the emulsion and purification 
of the amplicons, the samples were subjected to the sec- 
ondary PGR using 0.5 [iM final concentration of a uni- 
versal forward primer and an index-specific reverse 
primer (Table S3 in Additional file 1). Samples were 
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amplified as follows: initial denaturation at 94°C for 2 mi- 
nutes; 10 cycles at 94°C for 30 seconds, 56°C for 30 sec- 
onds and 68°C for 1 minute; and final extension at 68°C 
for 10 minutes, followed by a 4°C hold. The purified amp- 
lified library was then analyzed on an Agilent Bioanalyzer to 
quantify final amplicon yield and pooled in equimolar 
amounts. The pool was loaded at between 8 and 11 pM (de- 
pending on the run) and sequenced on the Illumina (San 
Diego, CA, USA) MiSeq sequencing instrument for 2 x 
150 cycles using custom sequencing primers (Table S3 in 
Additional file 1). The resulting reads were deconvoluted 
based on their index sequence. The raw reads are publi- 
cally available through the Short Reads Archive at the 
NCBI: SRA067610 and SRA067611. The libraries were se- 
quenced to an average of 3.1 million 151 bp long paired- 
end reads per sample (Table S4 in Additional file 1). 

Data analysis 
Mutascope 

The analysis was performed using Mutascope capable of de- 
tecting mutations at 1% allelic fraction with high sensitivity 
[10]. We first identified potential false positive variants 
(module makeBlacl<List). We then aligned the reads to the 
human genome (modules runBWA, refinement, groupRea- 
lign, and xpileup). Mutascope calculates the error rate for 
each position/substitution/strand group (module calcEr- 
rorRates) at positions that are not database of single nu- 
cleotide polymorphisms (dbSNP) positions and uses this 
to calculate the binomial probability of mutations in the 
tumor (module callSomatic) distinguishing somatic from 
germline using an additional Fisher exact test. Finally, 
likely false positive mutations were filtered out using 
coverage bias, read-group bias, ambiguity of alternate al- 
lele, mapping quality, alternate allele quality, proximity to 
an indel, or to a homopolymer. 

Copy number alterations 

The average number of reads per gene was calculated for 
each sample sequenced. We then computed the mean and 
standard deviation of the normalized coverage in the germ- 
line DNA for each patient at each gene. The significance of 
amplification or deletion of a specific gene in the tumor 
DNA was estimated by comparing the tumor normalized 
coverage to the distribution of normal normalized coverage 
at this gene for all patients, using the R function pnorm. Fol- 
lowing the Bonferroni correction for multiple testing, we re- 
ported amplifications (log/? >1) and deletion (log/? < -1) 
events with P <5.6 x 10"^ (Table S5 in Additional file 1). 

Variant annotation 

Variants were queried against dbSNP135 to determine novel 
or known variants. We next used snpEff [13] version 2.0.5 
in combination with GATK VariantAnnotator (Broad Insti- 
tute, Cambridge MA, USA), both with default parameters. 



to identify the different functional impacts on coding genes. 
We enriched this annotation by cross-referencing the list of 
variants to the dbNSFP database [14], which provides con- 
servation (PhyloP), functional prediction (SIFT, PolyPhen 
and MutationTaster), as well as Uniprot codon change infor- 
mation. Finally, we annotated the variants for presence in 
Catalogue of Somatic Mutations in Cancer v61 (Welcome 
Trust Sanger Institute, Hinxton, UK) based on coordinate 
and genotype. Notably, we used Catalogue of Somatic Muta- 
tions in Cancer codon numbering when discordant number- 
ing was reported between databases. 

Results 

We collected 38 tumors, including two lobular invasive car- 
cinoma, 35 ductal invasive carcinoma (six of which showed 
lobular features) and one Ductal Carcinoma In Situ. Not- 
ably, four tumors had cellularity lower than 20% (Figure 1) 
and six tumors were Her2-positive as determined by stand- 
ard testing (Table SI in Additional file 1 and Additional file 
2). We assembled a panel of 47 genes to analyze these speci- 
mens using UDT-Seq. The genes were selected for their 
clinical importance or their relevance to breast cancer genet- 
ics and treatment (Table 1). The coverage resulting from the 
sequencing of the 1,736 amplicons from 38 pairs of tumor 
DNA and germline DNA was deep (Table S4 in Additional 
file 1; with an average of 1,481 reads per amplicon), sensitive 
(with 92% of the bases covered at 500 x or more) and highly 
uniform (with an average of 92.6% of the bases within two- 
fold of the mean) - in agreement with the published specifi- 
cations of microdroplet PCR [5,10,15], which provides high- 
quality data for clinical sequencing. 

Chromosomal alterations 

The precise allelic fraction measured at each sequenced 
position by UDT-Seq can be reflective of the prevalence 




Samples 



Figure 1 Histology examination. For each sample, the proportion 
of necrosis, immune cells, stromal cells, in situ tumor and invasive 
tumor is indicated. 
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Table 1 Genes included in the breast cancer panel 
sequenced by UDT-Seq 



Gene 



Rationale for 
inclusion^ 



Molecular US FDA- Pathway 
eligibility approved inhibitor 
for clinical pathway in clinical 







trial 


inhibitor testing 


PIK3CA 


S, 


Y 


Y 


PTEN 


S,G 


Y 


Y 


BRAF 


S, 




Y Y 


KRAS 


S, 




Y 


EGFR 


S, 




Y Y 


ALK 


S, 


Y 


Y Y 


ERBB2 


S, 


Y 


Y Y 


JAK2 


S, 




Y 


PDGFRB 


S, 




Y 


RET 


S,G 




Y 


JAKl 


S, 




Y 


RARA 


S, 




Y 


TP53 


S,G 






CDHl 


S, 






GATA3 


S, 






CTNNAl 
RBI 


S, 

S, G 






CDKN2A 


S, 




Y 


AKTl 


S, 




Y Y 


ARC 


S,G 






PIK3R1 


S, 






BRCAl 


S,G 




Y 


ERBB3 


S, 






JAK3 


S, 




Y 


NOTCH! 


S, 




Y 


MET 


S, G 




Y Y 


FGFR2 


s, 


Y 


Y 


ABL2 


S, 






BRCA2 


S, G 




Y 


CTNNBl 


S, 






ERBB4 


S, 




Y 


FGFRl 


S, 


Y 


Y 


EGFR 1 OP 


S, 






PALB2 


S, G 






TOPI 


S, 




Y 


DPD 


P (capecitabine/ 
5-fluorouracil) 






TPMT 


P (6-nnercaptopurine 
thioguanine) 






CYP2D6 


P (tamoxifen (+/-)) 






CYP2C9 


P (warfarin) 






VKORCl 


P (warfarin) 







Table 1 Genes included in the breast cancer panel 
sequenced by UDT-Seq (Continued) 

CFTR R (cystic fibrosis) 

MLHl G 

MSH2 G 

MSH6 G 

PMS2 G 



CHK2 
ATM 



FDA, Food and Drug Administration; UDT-Seq, ultra-deep targeted sequencing. 

somatic mutations; G, germline cancer risk; P, pharmacogenetic risk; 
R, reproductive significance. 



of a mutated clone in the tumor sample, but can also re- 
sult from chromosomal losses or gains. Therefore it is 
important to first identif)^ these chromosomal alterations 
to interpret the mutations' allelic fraction but also to re- 
veal potential actionable events such as the amplification 
of a targetable oncogene. 

As shown previously, the distribution of the fractions 
of reads per amplicon generated by UDT-Seq is highly 
reproducible from sample to sample [5]. As a result, the 
difference in coverage depth of an amplicon between 
tumor and germline can be indicative of chromosome 
copy number gains or losses. Indeed, we noticed that 
five of the six samples determined by traditional 
methods (immunohistochemistry or fluorescent in situ 
hybridization) to have Her2 amplification show a higher 
coverage depth at ERBB2 amplicons, the gene coding for 
Her2 (Figure 2A). The immunohistochemistry or fluor- 
escent in situ hybridization score is correlated with the 
level of amplification determined by this approach (r^ = 
0.70; Figure 2B). We also identified potential copy 
number gains of ABL2, BRAF, FGFR2 and PIK3CA in 
one sample, FGFRl in two samples, as well as a loss 
of FGFRIOP in one sample (Figure 2 A; Table S5 in 
Additional file 1). Despite the high coverage depth 
generated, the low tumor cell content and overall level 
of gene amplification in a sample can reduce the sensitivity 
of this approach, as illustrated by a false negative Her2' 
amplified sample, which had low in situ hybridization ratio 
(2.8) and 50% tumor cell content. Nevertheless, this in- 
ference of copy number alterations can identify bona- 
fide actionable events. 

The high depth of sequencing of both tumor and 
germline also facilitates the identification of loss of hetero- 
zygosity events, by measuring the allelic fraction of het- 
erozygous polymorphisms in the tumor (Figure 2C,D). 
This observed effect on allelic fraction is, however, a 
combination of tumor purity and ploidy that is difficult 
to separate using only -150 germline variants per pa- 
tient. We can summarize this instability using the stand- 
ard deviation of the allelic fraction of the heterozygous 
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II 



I I r 



1^ 



ATM 

BRAF 

BRCA1 

BRCA2 

CDH1 

CDKN2A 

CFTR 

CHEK2 

CTNNA1 

CTNNB1 

CYP2C9 

CYP2D6 

DPYD 

EGFR 

ERBB2 

ERBB3 

ERBB4 

FGFRl 

FGFR10P 

FGFR2 

GATA3 

JAK1 

JAK2 



B 



MLH1 
MSH2 
MSH6 
N0TCH1 



PIK3CA 
PIK3R1 
PMS2 
PTEN 
RARA 
y RB1 
RET 
T0P1 
TP53 
TPMT 
VK0RC1 



Log2(T/N) 




D 



< 
o 
E 





• 








SDH=30.2 



20 40 60 80 100 
Germline Allelic Fraction 



IDC 

IDC^ 
ILC 



lob. feat. 



Grade 



0.2 0.4 0.6 0.8 
Maximum Cellularity 



20 40 60 80 100 
Germline Allelic Fraction 



Figure 2 Somatic rearrangements. (A) Heatmap representing the average log/? ratio of tumor/germline coverage deptli observed on all 
amplicons of a given gene (rows) in the sequenced samples (columns). Red, gains; blue, losses. Black frames indicate significant changes 
(P <5.6x 10"^). (B) The log/? ratios of tumor/germline coverage depth of the Her2 gene correlate with the results of immunohistochemistry. 
(C), (D) Scatterplot representing the allelic fraction of the germline variants in the germline DNA {x axis) and tumor DNA (y-axis) for tumors 
showing a low (C) or high (D) level of chromosomal instability. The standard deviation of heterozygotes (SDH) score, calculated from the 
standard deviation of the allelic fraction of heterozygous single nucleotide polymorphisms in the tumor, is indicated. (E) Distribution of SDH 
scores in the sequenced cohort as a function of histological grade (x axis). Invasive lobular carcinoma (ILC; red) and invasive ductal carcinoma 
(IDC) showing lobular features (orange) are indicated. (F) Cumulative fraction of tumors with high SDH score (y axis), at increasing tumor 
cellularity {x axis). IHC, immunohistochemistry. 



single nucleotide polymorphisms observed in the tumor 
(standard deviation of heterozygotes (SDH) score; 
Figure 2E). The SDH score was correlated with the Not- 
tingham grade {P < 0.005, Students t test), indicating that 
high-grade tumors have more chromosomal rearrange- 
ments, especially for ductal carcinomas in situ. Similarly, 
for highly cellular tumors, a high SDH score is indicative 
of a high chromosomal instability. As expected, a higher 
fraction of elevated SDH score was observed in high cellu- 
larity samples (Figure 2F), indicating that chromosomal 
instability is more difficult to identify in heterogeneous 
samples using our approach. As described below, the 
identification of loss of heterozygosity events is important 
for the interpretation of the allelic fraction at somatic 
mutations. 

Tumors' mutational landscape 

We identified somatic variants, substitutions and inser- 
tion/deletions in the sequenced samples using Muta- 
scope [10]. Four patients had no mutations, and 34 
patients had between one and 12 nonsilent mutations 
(one to 16 total mutations). In total, we identified 76 
somatic variants across the 34 cases, of which 62 were 



nonsilent, resulting in a coding change in 28 genes 
(Table S6 in Additional file 1). 

To highlight the specificities of the patient cohort and 
the sequencing assay, we compared our results with 
those obtained from a large TCGA cohort of 507 breast 
invasive carcinomas that were sequenced at all coding 
genes [8]. We observed that 17% of the TCGA samples 
had no detectable mutations in the 47 genes of our 
panel, as compared with the 10% of samples with no de- 
tectable mutations determined by our approach (Figure 3 
inset). Similarly, there were three or more somatic muta- 
tions in 18% of the samples in our study compared with 
only 8% in the TCGA dataset. Thirty-nine of the 41 
genes mutated either in our study or in the TCGA dataset 
were mutated in the same fraction of samples (P >0.05; 
Figure 3). Only ERBB2 and PMS2 showed a significant dif- 
ference {P <0.05), although the large difference in sample 
size could weaken this comparison. Altogether, these 
observations suggest our approach has a greater sensi- 
tivity to detect mutations in potentially clinically action- 
able genes. 

The most frequently mutated gene, TP53, was altered in 
37% (14/38) of the patients. In six patients, the mutation 
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TP53 
PIK3CA 
GATA3 
CDH1 
* ERBB2 
BRCA2 
ATM 
BRCA1 
JAK2 
FGFR2 
*PMS2 
PTEN 
PIK3R1 
AKT1 
RB1 
ERBB3 
APC 
BRAF 
MSH2 
CYP2C9 
ABL2 
N0TCH1 
JAK3 
PDGFRB 
RARA 
PALB2 
FGFR1 
CDKN2A 
CFTR 
ERBB4 
MSH6 
EGFR 
MET 
RET 
ALK 
KRAS 
CYP2D6 
CTNNA1 
JAK1 
MLH1 
T0P1 



lUDT-Seq 
■ TCGA 




0 1 2 >3 
Number of Non Silent Mutations 



0 



0.4 



0.1 0.2 0.3 

Fraction of Samples 

Figure 3 Comparison with The Cancer Genome Atlas cohort. 

Bar graph representing the fraction of samples with nonsilent 
somatic mutations in The Cancer Genome Atlas (TCGA) cohort 
(n = 507, blue) and the studied cohort (n = 38, red). ""Statistically 
significant difference (Fisher test P <0.05). Inset: bar graph indicating 
the fraction of samples with none, one, two, or three or more 
nonsilent mutations over the entire TCGA cohort (blue) or studied 
cohort (red). UDT-Seq, ultra-deep targeted sequencing. 



was homozygous, leading to a frameshift {n = l), a non- 
sense {n = 3) or a missense {n = 2), supporting the total 
loss of function of TP53 in these cases. In one patient, 
three missense mutations (P142L, P158L and R158C) 
were present on the same DNA strand, indicating that 
one TP53 allele remained wild-type. The remaining seven 
patients had heterozygous mutations, which were all pre- 
dicted to be deleterious. Interestingly, we noticed TPS3 
mutations with high allelic fraction in low cellularity tu- 
mors (Figure 4A, red box). Assuming that the adjacent tis- 
sue sections used for histology and sequencing have 
comparable cellularity, this suggests that TP53 mutations 
may be present in the surrounding stroma, consistent with 
previous observations [16-19]. 



The second most frequently mutated gene, PIK3CA, 
was mutated in 24% (9/38) of the patients. All of the 
mutations occurred in mutational hotspots known to re- 
sult in a phosphoinositide-3 kinase (PI3K) gain of func- 
tion: E545K {n = 4), H1047R {n = 3), E542K {n = 1) and 
C420R {n = 1) [20]. In contrast to TPS3, the allelic frac- 
tion of PIK3CA mutants was proportional to the tumor 
cellularity (Figure 4B), with the exception of two tumors 
(Figure 4B, red box) of high cellularity (>80%) and lower 
PIK3CA mutant allelic fraction (<30%), indicating that 
the mutations may have been present in only a subset of 
the tumor cells. 

GATA3 was found mutated in 16% (6/38) of the pa- 
tients. Interestingly, five out of the six mutations led to a 
frameshift, consistent with the findings of the TCGA 
(88%, 38/43) and much higher than the initial GATA3 
mutational analysis performed by Sanger sequencing in 
breast cancer (30%, 2/6) [21]. The frameshift mutations 
in this transcription factor occurred in the vicinity of the 
Zn Finger domain (residues 263 to 313), which also sur- 
rounds the nuclear localization signal [22] . The mutations 
may therefore result in a loss of function by preventing 
DNA binding or nuclear import. The unique mutational 
profile of GATA3, dominated by frameshift mutations, 
may prompt further investigations about their mechanism 
of onset and significance. 

We also identified less frequently mutated genes with 
potential value in the clinic. One patients tumor was de- 
termined to harbor a PIK3R1 'K567E mutation, which has 
been observed in endometrial cancer [23]. Although the 
significance of this particular substitution is not known, 
loss of function mutations of the regulatory subunit of the 
PI3K complex can contribute to the activation of PI3K 
pathway [24]. Similarly the PTEN frameshift mutation 
identified in another patients tumor may result in partial 
PTEN loss of function and subsequent PI3K activation. 
Three patients carried missense mutations in ERBB2, all 
predicted to affect its function. Two of these mutations 
were located in the kinase domain and are known to me- 
diate resistance to lapatinib (L755S) [25] or to activate 
Her2 (D769H) [26]. Finally, we identified four mutations 
in CDHl in three tumors. Interestingly, two tumors were 
diagnosed as lobular cancer and one had lobular features, 
in agreement with the increased prevalence of E-cadherin 
loss (encoded by CDHl) in lobular breast cancer [27]. 

Tumor subclonal populations 

While 35/38 patients had between zero and three som- 
atic mutations, three patients had more than three mu- 
tations. Because of the high sequencing coverage depth 
(> 1,000-fold), we were able to identify subclonal cell 
populations in these tumors (Table S7 in Additional file 1; 
Figure 4C). We identified one patient with 12 nonsi- 
lent mutations, which corresponds to about 10 times the 
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Figure 4 Patterns and actionability of somatic mutations. (A), (B) The allelic fraction of all TP53 (A) and PIK3CA (B) nonsilent somatic mutants 
(y axis) is displayed as a function of the cellularity of the tumor {x axis). Red boxes indicate samples where the allelic fraction deviates from tumor 
cellularity. (C) The allelic fraction of the nonsilent somatic mutations in the three tumors showing evidence of two subclones is displayed as a 
function of the tumor cellularity (x axis). Inset: highlighting the distribution of allelic fraction of the mutations identified in the two clones of 
AA952. (D) Schematic representation of the type of somatic variation identified in the genes actionable for their somatic status. The tumor 
cellularity is displayed in a purple gradient color. The samples are ranked by decreasing number of actionable somatic mutations. 



average mutation rate observed in breast cancer [8]. Al- 
though this hypermutated tumor had a cellularity of 90%, 
we observed a set of seven mutations at 17% and a set of 
five mutations at 13% allelic fraction, with both sets repre- 
senting statistically different populations {P <10~^, Students 
t test; Figure 4C, inset). One possible explanation is the 
presence of two subclones: assuming the seven mutations 
at higher allelic fraction are present in a heterozygous sate 
in a major founder clone (28% of the cells, 14% of the 
DNA) from which a minor clone arose, adding five het- 
erozygous mutations (26% of the cells, 13% of the DNA). 
Among the founder clone mutations, we noticed a BRCAl 
nonsense mutation, which may explain the high mutation 
rate observed in this sample. 

The last two patients carried six mutations each. One 
patient with lobular carcinoma had two CDHl muta- 
tions and one ERBB2 mutation at -16% allelic fraction, 
as well as a distinct set of mutations in PTEN, BRCA2 
and PMS2 at -5% allelic fraction. The observed allelic 
fractions are in contrast with the high cellularity (90%) 
and absence of strong rearrangement (SDH = 8.5) in this 



lobular tumor. Assuming that the mutations are not mu- 
tually exclusive, this observation implies that the loss of 
a PTEN allele only appeared recently in the tumor and 
that the majority of the tumor cells had no detectable 
somatic events in the panel of genes investigated. Finally, 
the tumor of one patient, also with low SDH and high 
cellularity, harbored two hallmark mutations at -50% al- 
lelic fraction {PIK3CA and TP53) probably driving the 
initial tumor, but carried four mutations at -16% allelic 
fraction, suggesting the presence of a subclone consist- 
ing of 32% of cells. This study highlights how the dif- 
ferences in allelic fraction observed within tumors can 
reveal subclonal populations and genetic drivers, and 
could be used to monitor treatment and possibly prevent 
future resistance. 

Importance of the germline variants 

Our approach identified 586 inherited germline variants, 
with a median of 140 per patient, 85% of them present 
in dbSNP (Table S8 in Additional file 1 and Additional 
file 2). We first investigated the presence of deleterious 
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variants in BRCAl/2, which are the most actionable 
genes in the clinical setting. We identified three patients 
with a predicted deleterious mutation in one of these 
genes, of which only one seems truly deleterious [28-30] 
(Table S9 in Additional file 1). The BRCAl-Q1355_E1356fs 
frameshift mutation is a previously reported deleterious 
mutation [30] and is clinically actionable. Interestingly, 
the mutant allele was selected for in the tumor (allelic 
fraction 94%), indicating a selective advantage. This germ- 
line finding was later confirmed by a Clinical Laboratory 
Improvement Amendments-approved assay after the pa- 
tient consulted with a clinical genetic counselor. 

Inherited variants in DPYD have been associated with 
toxicity to 5-fluorouracil or capecitabine chemotherapy 
[12], which is commonly used in breast cancer treat- 
ment. We identified six patients carrying three variants 
in DPYD with predicted deleterious effects. Three pa- 
tients were heterozygous for rs 180 11 60 (Minor Allele 
Frequency = 0.04). This single nucleotide polymorphism 
defines the DPYD% haplotype, which has been associ- 
ated with increased toxicity [31]. Two novel missense 
variants (K259E and V944A) identified in three patients 
have an unknown significance. Interestingly, a recent 
study indicates that variants in DPYD can actually in- 
crease its metabolic activity, therefore protecting against 
toxicity and decreasing drug efficiency [32]. Until more 
functional experiments are performed, it will be challen- 
ging to unambiguously determine the clinical relevance 
of most inherited DPYD variants. We also identified two 
patients carrying one inactive allele of the gene 
{CYP2D6%), However, it is not clear whether this particu- 
lar allele, in a heterozygous state, is associated with a re- 
duced metabolism of tamoxifen; therefore, a change in 
drug dosage is not justified. 

More generally, our approach identified many inher- 
ited variants of unknown significance, which should be 
cautiously interpreted. Importantly, in the absence of a 
matched germline sample, some of these variants might 
have been misidentified as tumor-specific events poten- 
tially confounding the rationale for targeted therapy, 
therefore highlighting the importance of sequencing 
matched germline DNA. 

Clinical implications 

Out of the 47 genes sequenced, 24 are classified as ac- 
tionable based on their somatic status (Table 1). These 
genes or the pathway they belong to could be targeted 
by a specific inhibitor, commercially available or under 
investigation {PIK3CA, ERBB2), or are predictive bio- 
markers for targeted therapies that are approved or in 
clinical trials {BRCAl/2, PTEN). There were 21 patients 
whose tumors carried nonsilent mutations or copy num- 
ber alterations in 17 of these 24 genes (Figure 4D). Im- 
portantly, three of the patients had tumors with less 



than 20% cellularity and in four patients we identified 
mutations at an allelic fraction of 10% or lower. We can 
establish the added benefit of our strategy in such cases: 
if we had limited our analysis to the samples with cellular- 
ity higher than 60% (19 samples), which is the inclusion 
criteria used by the TCGA, we would have identified mu- 
tations in only six patients for an overall sensitivity of only 
31% (6/19 cases). However, by using the UDT-Seq ap- 
proach, we identified mutations in actionable genes in 21 
of the 38 patients studied for an overall sensitivity of 55% 
(21/38 cases), combining the benefits of less stringent in- 
clusion criteria and higher assay sensitivity. 

Based on these molecular findings, we then summarized 
the most likely clinical course of action (Table 2). Looking 
at somatic mutations and amplification, we would have 
proposed the use of trastuzumab for seven patients based 
on ERBB2 status. Notably, for one of them the ERBB2 
gene is not amplified but carried an activating mutation, 
which would have been missed through standard Her2 
testing. We would have further recommended the enroll- 
ment of 12 patients in a PIK3CA inhibitor clinical study 
due to a mutation in the PIK3/AKT/mTOR pathway. Four 
other patients may have been considered as candidates for 
the clinical testing of an FGFR inhibitor. Finally, for seven 
patients, the molecular testing suggests that they could 
each have benefited from PARP, CDK4/6, AKT, ABL2, 
BRAE, JAK or RARA inhibitors. Importantly, we were able 
to identify 18 patients who might specifically benefit from 
the advantages of our approach (Table 2). Regarding 
germline mutations, one patient carrying a germline 
BRCAl mutation underwent genetic counseling and had 
her mutation confirmed in a Clinical Laboratory Improve- 
ment Amendments-certified setting. One patient carried a 
germline CETR deleterious mutation. These types of inci- 
dental findings, not related to breast cancer treatment, 
should be returned to the patient according to recent 
guidelines of the American College of Medical Genetics 
[33]. Overall, combining both somatic and germline dis- 
coveries, 25 patients had genetic results potentially in- 
formative for their care, of which 19 would not have been 
identified through routine testing. 

Discussion 

An increasing number of diagnostic companies and health- 
care centers are proposing to perform tumor genetic pro- 
filing to support precision cancer care. Assays providing 
both deep and genome-wide or broad coverage are not yet 
available or currently justified in a clinical setting. There- 
fore, one should look directly at patient benefit and clin- 
ical utility to select an appropriate strategy. We still have a 
limited understanding of the role of most proteins even in 
pathways deemed actionable. Therefore, until more clinical 
evidence is provided, broad or genome-wide sequencing is 
likely to unveil mutations for which a clear therapeutic 
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Table 2 Summary of the primary course of action likely to result from the molecular testing 


Patient 


SNP or mutation (allelic fraction) 


Proposed action 


UDT-Seq advantages^ 


AA1025 


rsl 13993959 (Het) 


CFTR genetic counseling 


Germline 


AA1090 


CDKN2A-A85D (66%) 


CDK4/6 inhibitor 






FGFRl amplification 


FGFRl/2 inhibitor 


CNA 


AA1106 


ERBB2-L755S (17%) 


Trastuzumab 






PTEN-frameshift (5%) 


PIK3CA inhibitor 


Depth 




BRCA2-I1418T(4%) 


PARP inhibitor 


Depth 


AA1204^ 


PIK3CA-H1047R (26%) 


PIK3CA inhibitor 


Sensitivity 




Her2 amplification 


Trastuzumab 


CNA 


AA1222^ 


Her2 amplification 


Trastuzumab 


CNA 


AA1247^ 


Her2 amplification 


Trastuzumab 


CNA 




ERBB2-D769H (5%) 




Depth 


AA1267 


PIK3CA-H1047R (45%) 


PIK3CA inhibitor 




AA1277 


rs80357508 (Het) 


BRCAl genetic counseling 


Germline 




FGFR2 amplification 


FGFRl/2 inhibitor 


CNA 


AA948 


PIK3CA-E545K (34%) 


PIK3CA inhibitor 


Sensitivity 


AA952 


PIK3CA-E545K(16%) 


PIK3CA inhibitor 






BRCA1-W306* (18%) 


PARP inhibitor 






BRCA1-E550K(13%) 








JAK2-S131L (16%) 


JAK inhibitor 






JAK3-I386M (13%) 








rsl 801 160 (Het) 


5-FU toxicity 


Germline 


AA957 


PIK3CA-E542K (28%) 


PIK3CA inhibitor 




AA1515 


PIK3CA-E545K (70%) 


PIK3CA inhibitor 




UCI 1546879 


PIK3R1-K204E (30%) 


PIK3CA inhibitor 




UCI 1689380 


RARA-337 T(14%) 


RARA inhibitor 






BRAF amplification 


Vemurafenib 




UCI 1908503^ 


PIK3CA-H1047R (40%) 


PIK3CA inhibitor 






Her2 amplification 


Trastuzumab 


CNA 


UCI1951813 


PIK3CA-E545K (7%) 


PIK3CA inhibitor 


Sensitivity 


UCI2076630^ 


Her2 amplification 


Trastuzumab 


CNA 


UCI2224680 


BRCA2-L1829F (2%) 


PARP inhibitor 


Depth 


UCI2564879 


PIK3R1-K204E (30%) 


PIK3CA inhibitor 




UCI2649875 


A^1-L52R (63%) 


AHT inhibitor 






FGFRl amplification 


FGFRl/2 inhibitor 


CNA 


UCI42 16548 


FGFRl -D683H (13%) 


FGFRl/2 inhibitor 




UCI8965412^ 


Her2 amplification 


Trastuzumab 


CNA 




ABL2 amplification 


Imatinib 


CNA 


UCI 1804937 


rsl 801 160 (Het) 


5-FU toxicity 


Germline 


UCI2008866 


rsl 801 160 (Het) 


5-FU toxicity 


Germline 


UCI3564897 


PIK3CA amplification 


PIK3CA inhibitor 


CNA 



5-FU, 5-fluorouracil; SNP, single nucleotide polymorphism. ^Depth, accurate calls at low allelic fraction (<10%); sensitivity, accurate calls In heterogeneous samples; 
CNA, Inference of copy number alterations; germline. Inclusion of a matched germline DNA. '^Her2-posltlve determined through standard of care. 
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rationale is not yet available or misunderstood. In con- 
trast, the use of deep sequencing of a restricted panel of 
genes increases the sensitivity to detect well-known and 
actionable mutations, which can have a greater impact in 
the clinic. For these reasons, deep sequencing of a re- 
stricted gene panel is likely to benefit the greatest number 
of patients today. Using our UDT-Seq approach, we iden- 
tified potentially actionable mutations in 14/19 patients 
whose tumor samples had less than 60% cellularity and 
discovered actionable mutations present at 10% allelic 
fraction or less in four patients, some of whom had tu- 
mors with high malignant cellularity. UDT-Seq offers a 
very quantitative measurement of the allelic fraction of 
the mutations providing information about the biology of 
the tumor. For example, we observed a field effect in tu- 
mors harboring TP53 mutations and the presence of sub- 
clonal PIK3CA mutations or of multiple mutated clones 
in three tumors, probably resulting from their evolution. 
Clinical utility of these new data will require specific trials 
to show that targeting resistant subclones or field effects 
is likely to improve outcomes in both the curative and pal- 
liative setting. 

Traditionally, tumor-specific markers are investigated 
in the tumor specimen only. While this may be sufficient 
for protein markers, a DNA mutation is identified as a 
mismatch to the reference human genome and could 
correspond either to an inherited variant or somatically 
acquired mutation in the tumor. Only the sequencing of 
matched germline DNA can confirm that the variant is 
somatic, providing a better rationale for the use of tar- 
geted therapy, or inherited, providing important infor- 
mation for the care of the patient and their relatives. 
Finally, the use of matched germline DNA sequencing 
facilitates the detection of mutations at low allelic frac- 
tion [10,34], which, as discussed above, is likely to be ex- 
tremely important for optimal implementation in clinical 
care. It is typically feasible to obtain a blood or buccal 
sample along with the tumor or biopsy sample being in- 
vestigated, without excessive burden. 

Importantly, the adoption of such transformative diag- 
nostic assays in the clinic needs to include physician educa- 
tion and training and be associated with the establishment 
of molecular tumor boards in academic centers. These mo- 
lecular tumor boards are not focused on a particular can- 
cer by site of origin, but rather on the molecular markers 
identified. The presence of basic scientists with expertise in 
the altered pathways also improves the clinical interpret- 
ation. Indeed, the role and clinical significance of muta- 
tions located in less commonly mutated exons, genes or 
in the noncoding portions of the genome [35] remain to 
be established. Interpreting these variants of unknown sig- 
nificance, whether inherited or somatic, is the most con- 
troversial and difficult aspect of clinical sequencing. 
Despite attempts to consolidate variants, mutations, and 



clinical information in public databases, molecular tumor 
board members must currently perform extensive litera- 
ture searches to predict the impact of a mutation. In our 
study, missense mutations in ERBB2 were reported as ac- 
tivating by only a few published studies, suggesting their 
relevance for trastuzumab or lapatinib treatment. A simi- 
lar challenge exists for the interpretation of polymor- 
phisms in drug metabolizing genes, which will benefit 
from the efforts of the pharmacogenomics research net- 
work [36]. Finally, such precision medicine strategy is 
sensible only if it benefits the patients. For inherited vari- 
ants, access to clinical genetic counseling is critical to in- 
terpret the results in the context of a complete family 
history. Similarly, targeting genes with somatic mutations 
using an investigational drug, requires access to a clinical 
trial or reimbursement for off-label use of targeted drugs 
with clinical outcome captured in a clinical registry study. 

Conclusion 

Our study evaluates the potential benefits of the UDT- 
Seq of 47 selected genes for breast cancer care. We show 
that our assay identifies actionable findings, both inher- 
ited variants and somatic mutations, in 25 out of 38 
samples. In particular, the specificities of our assay - in- 
clusion of germline DNA, identification of copy number 
variants, high coverage depth and sensitivity to identify 
somatic mutations at low allelic fraction - would have 
been directly beneficial to 18 patients. As high-throughput 
sequencing starts to be used in clinical care, its establish- 
ment as a routine diagnostic assay will require progress on 
many fronts: demonstration of technical validity and clin- 
ical utility, education of physicians and trainees, and co- 
operation with pharmaceutical and insurance companies 
to increase drug accessibility. 
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